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Abstract 

GARCH models are useful tools in the investigation of phenomena, 
where volatility changes are prominent features, like most financial data. 
The parameter estimation via quasi maximum likelihood (QMLE) and 
its properties are by now well understood. However, there is a gap 
between practical applications and the theory, as in reality there are 
usually not enough observations for the limit results to be valid approx- 
imations. We try to fill this gap by this paper, where the properties of 
a recent bootstrap methodology in the context of GARCH modeling are 
revealed. The results are promising as it turns out that this remarkably 
simple method has essentially the same limit distribution, as the original 
estimatorwith the advantage of easy confidence interval construction, as 
it is demonstrated in the paper. 

The finite-sample properties of the suggested estimators are inves- 
tigated through a simulation study, which ensures that the results are 
practically applicable for sample sizes as low as a thousand. On the 
other hand, the results are not 100% accurate until sample size reaches 
100 thousands - but it is shown that this property is not a feature of 
our bootstrap procedure only, as it is shared by the original QMLE, too. 

Keywords: asymptotic distribution, bootstrap, confidence region, GARCH 
model, quasi maximum likelihood 

1 Introduction 

We investigate bootstrap estimation of the parameters of GARCH processes, 
which are known to be able to capture the main stylized facts of observed 
financial series. In these models, the conditional variance is expressed as a 
linear function of the squared past values of the series. 



Definition 1 (X t )tez is called a GARCH(p,q) process if 



htVt 



h t =Uo + 22 a ^ x t-i + X! P<0jht-j 
i=i j=i 



(1) 
(2) 



where r\ t (t € Z) are i.i.d. (0,1) random variables, Uq > 0,a oi > 0,/?oj > /or 
i = 1, ...,g and for j = I,..., p. 

It defines a stationary process for a well characterized parameter space, its most 
important features are presented in Section 2. 

The most important question in modeling is the parameter estimation. In case of 
GARCH models, the QMLE estimation is the most popular one. This assumes 
Gaussian distribution for the observations, providing reasonable approximations 
even in the case of other distributions for the innovation r) t . We conclude Sec- 
tion 2 with presenting the properties of this estimator. 

Of course, there are other estimation methods considered in the literature. The 
oldest and numerically simplest estimation method for GARCH models is the 
ordinary least squares (OLS). It performs poorer than the QML method and 
even for ARCH models the method requires moments of order 8 for the original 
process (Francq and Zakoian ( 2010} , Chapter 6). An other well known method 
is the least absolute deviations (LAD) estimation, which outperforms the QML 
estimator if the innovations are Student's t distributed with 3 or 4 degrees of 



freedom (Peng and Yao (2003)). Ling| ( 2007 I proposed a self- weighted QML es- 
timator for the parameters which is close in some aspects to our considerations. 
There are also several extensions of these estimators, see |Berkes and Horv ath 
(120041) andlFrancq and Zakoianl (12010} , Chapter 9. 



Section 3 deals with the main objective of this paper, namely the investigation 
of bootstrap methods. Although there are different approaches for bootstrap- 
ping the GARCH models, (these will be explained in more detail in Section 3) 
we suggest the multiplier bootstrap approach recently proposed by Kojadinovic 
and Holmes (20111 for goodness of fit tests for copulas. This is a simple gen- 



eralization of the standard bootstrap procedure, where the bootstrap sample is 
denoted by (r n iXi). This method is usually called weighted bootstrap and was 



Wellner 19931. 



investigated as early as in the 1990s (Barbe and Bertail 1995 Praestgaard and 



The bootstrap weights r ni (1 < i < n, n > 1) are supposed to be independent 
from the process. We show the asymptotic normality of the bootstrap estimators 
under conditions, which are fulfilled in the majority of practical examples. The 
weighted bootstrap OLS and LAD estimators for AR(1) and ARCH processes 
were investigated by Bhattacharya and Bose ( 2012} . 

Other bootstrap methods for GARCH models proposed in the literature are the 



residual bootstrap (for instance, see Hall and Yao ( 2003} ) and the block boot- 
strap (Corradi and Iglesias (2008)). These are tools for constructing confidence 



intervals for the parameters or for functionals of the parameters (Chen et al. 
(2011 ), Luger (2011 1, Pascual et al. ( |2006 )) and for evolving goodness-of-fit tests 
(Luger (20111, Horvath et al. (2004)). Bootstrap methods are especially needed 



if the errors are heavy-tailed and this is the case in most financial applications. 
Section 4 presents the results of a simulation study, where for simplicity we focus 
on ARCH(l) models. Here we also investigate the small-sample properties of the 
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QMLE estimators, together with their bootstrap counterparts. This approach 
is practical as both the similarities and differences can be demonstrated. We 
give the conclusions in Section 5. The proofs can be found in the Appendix. 



2 GARCH models 

In this Section we summarize the needed fundamentals from the theory of 
GARCH processes (see Francq and Zakoian (2010) for example). 
We denote the parameter vector by 

6 = 9 p +q+i) T = (co, ai, a q ,fli, (3 P ) T , 

which belongs to the parameter space 8 = (0, oo) x [0, oo) p+q . 
The true value of the parameters, 9o = (ojq, aoi, ao q , /?oi, Po P ) T is un- 
known. 

Theorem 1 // there exists a GARCH(p,q) process (7p - {§), which is second- 
order stationary, and ifcu>0, then 



i=i j=i 



(3) 



If ^ holds, the unique strictly stationary solution of model |Ip - is a weak 
white noise. 

Definition 2 Let (B t )t£Z be a strictly stationary sequence of random matrices, 
and i?(log + ||-Bt||) < oo. The (top) Ljapunov exponent of the sequence (Bt)teZ 
is 

A:= lim 7 E(]og\\B t Bt-i...B 1 \\). 

t->oo t 

The GARCH(p,g) process can be written in vector representation 

z t = b t + A t z t _ 1 , 

where 



A t = 
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Theorem 2 Let A denote the Ljapunov exponent of the matrix sequence (A t ) te %. 
Then 

A < there exists a strictly stationary solution of the GARCH(p, q) model. 

The following theorem shows that the Ljapunov exponent - thus the strict 
stationarity - is in connection with the existence of moments of the GARCH 
process, which will be helpful to verify the main results. 

Theorem 3 Let A denote the Ljapunov exponent of the matrix sequence (A t )tez- 
Then 

A < => 3s > 0, Ea 2s < oo, EX 2 t s < oo 
where X t is the strictly stationary solution of the GARCH(p,q) model. 

From now on we will concentrate on the maximum likelihood estimation of the 
parameters. Assume that {x\, . . . , x n } are observations from a GARCH(p,q) 
process (strictly stationary solution of the model). The Gaussian quasi- likelihood 
function, conditional on the xq, &\_ p , a 2 initial values, is 



L n {6) = L n (0;x!, ...,x„) = J] 



1 y/2ird, 



" 2 



where the (a 2 ) t >i are recursively defined by the following equation: 



i=l j=l 



The QMLE of 9 is defined as the solution 8 n of 

9 n = argmax L n (9). 



(4) 



To maximize the Gaussian likelihood function, we have to minimize the following 
function: 

/„(*) = -5>(*)> where l t {9) = ^j- + \og(a 2 t (9)). 



Let Ae(z) and Bg(z) be the generating functions 



i=l 
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3=1 

The following assumptions A1-A6 are sufficient for the quasi-maximum like- 



lihood estimator to have a Normal limit distribution (see Francq and Zakoian 
p004| ): 

Al: 6*o € 6 and O is compact 



A2: 7(A)) < and for all 6 e 6, £ ^ ■ < 1 

3=1 

A3: 77I has a nondegenerate distribution and Erjf — 1 

A4: If p > 0,Ag o (z) and Bg (z) have no common roots, 
•4* (1) ^0,a Oq + P Op ^0 

A5: 6» € int(9) 

A6: k v — Erjf < 00. 

Theorem 4 Le£ (0 n ) n >i be a sequence of QMLEs satisfying with initial 
conditions 



^l-q ~ ■■■ ~ 

Under assumptions A1-A4 



c = Xi 



^2 

a i- P 



(5) 
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Theorem 5 Under assumptions A1-A6 

N (0,(^-1) J- 1 ), 

TO— ±OC 

where 



J:=E 6 



V^(8n - 00 ) 

d 2 * t (0 o ) 



<96>d(9 T 



1 dat(e )daf(e ) 



(6) 



(7) 



With different assumptio ns, Theorem [4] was first proved by Berkes et al. (2003 1. 
Theorem [5] was proved by|Berkes et al.|(|2003[) and by|Hall and Yao|(|2003[). |Hall| 



and Yao| ([2003J also generalized the result to the case in which Erf = 00 and 
the distribution of rf 1 is in the domain of attraction of a Gaussian or stable law 
with exponent £ 6 [1,2). 



3 Bootstrap methods 
3.1 Weighted bootstrap 

We define the bootstrap weights as a triangular sequence of random variables 
T n i (1 < i < n > 1) independent from the process: 

Til 

1~21 t 22 

1~ji1 T n 2 . . . T n n 
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To verify the main results, we need some natural assumptions B1-B6 for the 
bootstrap weights: 



Bl 
B2 
B3 
B4 



the weights are independent from the GARCH process 
P{r n i > 0) = 1 1 < t < n, n>l 

for all ft, the first four moments of r n i, . . . , T nn are finite and equal 



lim Er n i = 1 i = 1,2,.. 

n— >oo 



B5: 7 := lim Et?* < oo i = 1, 2, ... 
B6: R(t^,t^) >0 ]£i?j. 

The usual bootstrap procedure (corresponding to a multinomial distribution) 
provides a suitable choice for weights, as it satisfies the six assumptions above. 
This holds for the following weights as well (we shall use the first two in the 
paper) : 

(r n i,...,r nn ) ~ Multinom (n; 1 



d. Exp(l), 

(t„i, r„ n ) ~ i.i.d. r(n,n). 

Calculating the Gaussian likelihood function for the weighted sample, we get 
the following modified negative loglike lihood function, to be minimized: 

1 n ( 2 

l*n(0) = -£ where Z* t (0) = r nt ( ^ + log(a 2 (0)) 

n t=1 \"t \°) 

For example if the weights are (1, 2, 0, 1, 1) then the second element of the 

sample is taken twice but the third one is omitted etc. 

The bootstrap QMLE of the parameter 9 is defined as the solution 0* n of 

0* = argmax J*(0). (8) 
eee 



Theorem 6 Let (#*) n >i be a sequence of bootstrap QMLEs satisfying with 
initial conditions (pi). Under assumptions A1-A4 and B1-B4 



Theorem 7 Under assumptions A1-A6 and B1-B6 



where 



Vn(0* n -0 Q )- A ->N (0,^-1) J- 1 ) (9) 



; d\(6 ) \ =E ( 1 daf(9 ) daf (9 ) 



d6dQ T J \of{9o) 96 d0 T 
The proofs of Theorems [6] and [7] can be found in the Appendix 
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3.2 Residual bootstrap 

A residual bootstrap method was proposed by Hall and Yao| (2003), who also 



constructed one-sided bootstrap confidence intervals and analyzed its coverage 
percentages by simulations for stationary ARCH(2) and GARCH(1,1) processes. 
The construction of the residual bootstrap sample consists of the following steps, 
which turns out to be useful if the sample is in its stationary distribution and 
we apply a suitable burn-in period: 

1. Given a sample {x\, ...,x n }, compute the QMLE 9 n : 

9„ = argmini £ k{6). 

eee t=i 

2. Estimate the conditional variance at of the process 



°t = y&t( () n) t=l,...,n. 

3. Estimate the residuals fj t 

7~) t = f t t = l,...,n. 

4. Calculate the standardized residuals i) t 

7?t = —j= t — l....,n. 



5. Draw a bootstrap sample with replacement from the standardized resid- 
uals: K,...,77*}. 

6. Using 9 n and {77*, ...,77*}, let us compute the residual boostrap sample 
{xl, x* n } of the process 

«t =cr t 7 ?t i = l,...,n 



(cr t *) 2 = Lb + ^ GLi{x*t-if + 



\ 2 
-3> 

t=l j=l 



By means of this residual bootstrap procedure, also confidence intervals for 
future values of the time series and for the at volatilities can be constructed 



(Pascual et al. 20061 



4 Simulations 

Although the GARCH(1,1) models perform usually better and surprisingly well 



against other, more sophisticated models (see Hansen and Lunde 20051, for 
the sake of simplicity we decided to illustrate the main results for stationary 
ARCH(l) models (special case p=0, q=l of Definition [TJ. So suppose that 
(Xj)tgz is generated by the ARCH(l) process 



X, 
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where rjt (t G Z) are i.i.d. (0,1) random variables, and 9q = (wo,ao), > 0, 
cxq > are the true parameters. The covariance matrix (k v — 1) J -1 of the limit 
distribution of the QMLE depends on the true parameters. We analyzed this 
dependence in stationary ARCH(l) processes, where the parameters are ojq > 
and < a < 1. The matrix J itself can only be approximated via simulations 
derived from (TFb: for large N and simulated data {x t )t 



,N, 



J« J 



AT 



(1 



Figure [T] displays the contours of the elements of the limiting covariance matrix 
if the innovations are Gaussian, based on TV = 10 s simulations, which provides 
accurate results up to at least four digits. The variance of Co and the covariance 
between uj and a are both more sensitive to changes in ujq than in a®. The 
variance of the estimated parameter a does not seem to depend on the true 
parameter value ojq- This is not trivial from the theoretical results, as from Q 
we get 



var(ci) 



( I ^ 



E °<> {uo+*oX?_J E <>o ( 

which needs further investigation. 



A7 



( x u \ 

\ui +oioX^_ 1 J 





Figure 1: Contours of the elements of the limiting covariance matrix, ARCH(l) 
process 



From now on we will concentrate on the ARCH(l) process with parameters luq — 
1 and o?o = 0.5. Then the limiting covariance matrix of the QML estimation is 



4.893 
-2.148 



-2.148 
3.926 



(10) 



Unfortunately (minimum) 10 6 replications are needed to confidently estimate 
the matrix, which takes several hours for an i7 computer with 8 GB RAM mem- 
ory. We will see that even the bootstrap can't help much if we draw too few 
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samples. 

We drew 10 6 samples with Gaussian innovations of size 100 to 5000 and calcu- 
lated the covariance matrix of the QML estimations. Figure [2] shows that the 
rate of convergence drastically improves until the sample size is under 1000 and 
just slightly after that. We found also for other pairs of parameters that with 
simulations of sample size 2000, the covariance matrix can be estimated quite 
well, within a 1% margin. 




Figure 2: Convergence of the sample covariance matrix, ARCH(l) process, lo — 
1 and «o = 0.5 

After that, 50000 samples of size n=500, 1000 and 2000 were generated with 
standard Gaussian and Student's t distributed innovations with 5 and 3 degrees 
of freedom, and we estimated the parameters with the QML method, described 
in Section 2. Boxplots of the sum of absolute errors (SAE) are depicted in Figure 
[| The SAE is defined as \Cj— ojq\ + \6c— ao\. We can see that the heavier tailes the 
innovations have, the larger the SAE is. Note that the Student's t errors with 3 
degrees of freedom have infinite fourth moment - so Theorem [5] does not work 
-, but the quasi maximum likelihood estimates are fairly close on average to the 
original parameters. As the sample size increases, the SAEs of course become 
smaller. Figure [3] doesn't display all SAE values for the Student's t innovations, 
the results for some samples are so bad that the SAE of the estimated parameters 
is more than 100. 

If we take multinomially distributed weights, then the scaling factor of the co- 
variance matrix is 7 = lim Erii = lim (2 — — ) = 2, therefore the quotient 

n— foo n— >oo 

of the two matrices by its elements must be near 2. Figure [4] displays the con- 
vergence of the elements of the sample covariance matrix, divided element-wise 
by the theoretical covariance matrix (JToJ) , if the sample matrices are calculated 
with the multinomially weighted bootstrap (panels (a.) and (b.)) or with the 
residual bootstrap (panels (c.) and (d.)), for sample sizes ranging from 100 to 
2000. Panel (a.) and (c.) show the convergence based on R = 1000 samples 
which were bootstrapped B — 1000 times, while the other two panels display 
simulations with R = 10000 and B = 100. The dashed lines are the sample 
covariance matrix values without bootstrap weights, divided by the theoretical 
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(a.) SAE, n=500 |b.| SAE, n=1 000 (c.) SAE, n=2000 





Figure 3: Boxplots of the sum of absolute errors (SAE) of the parameters if the 
innovations are standard Gaussian, Student's t with 5 and 3 degrees of freedom 
for different sample sizes: (a.) n=500; (b.) n=1000; (c.) n=2000 



values and scaled to 2. 

Unfortunately in Theorem [5] there is not a swift convergence. In panel (a.) of 
Figure [3] we can't see a straight convergence, the bootstrap can't substantially 
improve the properties of the original samples, it only decreases the differences. 
Panel (b.) of Figure [4] helps to understand the reason: the number of samples 
R = 1000 was too few. If we raise the number of samples to R = 10000, and (for 
practical reasons) decrease the bootstrap repetitions to B = 100, the conver- 
gence becomes quite good. Looking at the simulations it is not obvious which 
of the two bootstrap methods is the better one. 

After that, we constructed 95% confidence intervals for the GARCH parameters 
with Gaussian innovations. Table [T] contains the average coverage percentage 
of confidence intervals for the parameters uj and a for different sample sizes 
(500, 1000, 2000) and using residual or weighted bootstrap methods, always 
compared to the Monte Carlo empirical confidence intervals. For sample size 
of n = 500, the residual bootstrap outperformed the weighted bootstrap; but 
for sample size 2000, the residual bootstrap performed mostly better then the 
residual bootstrap. Using the weighted bootstrap, the average coverage of the 
confidence intervals improved by increasing the sample size which can't be stated 
in case of residual bootstrap. 

Using the limiting distributions ^ and ^ of the quasi-maximum likelihood 
estimator and its weighted bootstrap version, also confidence sets can be con- 
structed. For the limiting distribution of the residual bootstrap QMLE, see Hall 
and Yao (2003). Table [2] reports the average coverage of the confidence sets, the 



row 'Empirical' contains the 95% and 99% coverage of R=1000 samples, while 
the other two rows show the coverage of residual and weighted bootstrap QML 
estimates with R = 1000 samples and B = 1000 bootstrap replications. Note 
that in each case the weighted bootstrap QMLEs performed a bit better than 
the residual ones. Figure [5] represents the estimated pairs of parameters (a), a) 
and the 95% and 99% confidence sets - according to the limiting distribution - 
for different sample sizes (500, 1000, 2000). It can be seen that the confidence 
ellipses have a leaning longitudinal axis and the larger the sample size is, the 
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(a.) Multinomial weigths (R=1000, B=1uuu), co =1, a„=0.5 



(b.) Multinomial weigths (R=10000, B=100), ra =1, a =O.E 




Figure 4: Convergence of the sample covariance matrix, ARCH(l) process, uj — 
1 and ao — 0.5; (a.) Weighted bootstrap with multinomial weights, i?=1000 
and _B=1000; (b.) Weighted bootstrap with multinomial weights, i?=10000 and 
5=100; (c.) Residual bootstrap, i?=1000 and 5=1000; (d.) Residual bootstrap, 
i?=10000 and B=100 



smaller the ellipses become. The figures a.)~c.) were plotted for R — 1000 sam- 
ples and the figures d.)-f.) were plotted for the weighted bootstrap QMLEs, 
bootstrapped B = 100 times. Compared the points against the coverage sets, 
the coverage looks quite decent , and there are no clusters on the outside of the 
ellipses. 



5 Conclusions 

We have demonstrated that the multiplier bootstrap method reflects well the 
properties of the original QMLE estimator, thus it may be used for investigating 
the estimators in practical problems (we plan to come back to this issue in 
another paper soon). 

Another important observation of our simulations is that the asymptotic results 
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Sample 

size 


Method 


Average cover- 
age 
uj a 


Average cover- 
age below 
uj a 


Average cover- 
age above 
uj a 




Monte Carlo 


95% 95% 


2,5% 2,5% 


2,5% 2,5% 


500 


RB 
WB 


94.93 95.07 
94.19 94.23 


2.12 2.35 
2.66 2.29 


2.94 2.58 
3.15 3.47 


1000 


RB 
WB 


95.47 95.52 
94.81 94.88 


2.61 1.99 
3.06 1.93 


1.92 2.49 
2.13 3.19 


2000 


RB 
WB 


95.29 94.77 
94.74 95.07 


2.26 2.18 
2.62 2.21 


2.44 3.06 
2.64 2.72 



Table 1: Average coverage percentages of confidence intervals for the parameters 
uj and a for sample sizes 500, 1000, 2000 and using residual bootstrap (RB) or 
weighted bootstrap (WB) methods. 



Method 


n = 500 
95% 99% 


n = 1000 

95% 99% 


n = 1000 
95% 99% 


Empirical 

RB 

WB 


95.40 99.20 
95.84 99.12 
95.67 99.02 


96.20 98.90 
95.97 99.19 
95.91 99.13 


95.90 99.20 
96.02 99.24 
95.98 99.23 



Table 2: Average coverage of the 95% and 99% confidence sets for sample sizes 
500, 1000, 2000; using residual bootstrap (RB) or weighted bootstrap (WB) 
methods. 



presented in Sections 2 and 3 can be used for sample sizes in the range of 
thousands only, as for smaller samples the deviations may still be substantial. 
It is also worth mentioning that we have found an interesting dependence be- 
tween the asymptotic covariance matrix and the parameter values themselves, 
which should be taken into account in practical applications. 
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7 Appendix 



Proof of Theorem |6j 



We follow the proof of Francq and Zakoian (20041 and go into details only when 



changes are necessary. See the original proof in their paper or in their book 

(Francq and Zakoian ( 2010[ )) on pages 156-159. 

First, we introduce some notations to write the system of equations 



g 



t e 



12 



a.) Empirical, n= 500 



b.) Empirical, n= 1000 



c.) Empirical, n= 2000 



: ''if/**.'. •■ 





0.7 0.8 0.9 



0.8 0.9 1.0 1.1 1.2 1.3 



0.9 1.0 1.1 1.2 1.3 



e.) WB, n= 1000 



f.) WB, n= 2000 






0.6 0.8 1.0 1.2 1.4 



Figure 5: Pairs of estimated parameters (&, a) and the 95% and 99% confidence 
sets - according to the limiting distribution - for different sample sizes; R = 1000 
empirical estimated parameters (a.), b.), c.)) and weighted bootstrap (WB) 
estimators (d.), e.), f.), B = 100). 



in matrix form. 





( a " \ 




















r 2 — 




V d-p+i J 





i=l 



B 



( Pi fa 
1 



V o 



So we have 



2 2 i D 2 

QLt =c t +Ba t _ 1 



t € Z. 







1 J 

(11) 



Let us denote by Bk{6) the open sphere with center 9 and radius k. 

The proof consists of five steps and we also need a modification of the ergodic 

theorem. 
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(I.) The initial values are asymptotically irrelevant 



sup 



0. 



Iterating (111, we get that for some appropriate K > and < p < 1 



sup\\ai(9) - a_i(e)\\ < Kp f t e Z. 
see 



sup 



1 n f 
n ^ eee { 



~2 2 
a t - a t v 2 



^•2-2 



9 fsup^) - Vr ntj o t X t 2 + (sup-*) -Kj^Tn 



To prove ( 12 1, it is sufficient to show that 

I]TWx t 2 ^o 

77 < * n. — KV1 

and 



i=l 



1 

- > r ni /?' ► 0. 

71 ' * I) — V "V 



For arbitrary 8 > 



£p(wx t 2 >5)<x:, 



tEfaX?) _ E(rt t )E(X*>) 



t=o 



t=0 



(1 - ps)^ 



< oo 



and 



t £(r nt ) _ E(r„ 4 ) 



t=o 



(1 - 



< oo. 



In the estimation above we applied Markov's inequality and Theorem [3] 
Using the Borel-Cantelli lemma, we get 



and 



P ( lim Tmp^? =0)=1 



Pi limr„ tj o* = 0) = 1. 



Finally, using Cesaro's lemma, (14) and (15 1 are proved 



(II.) Identifiability of the parameter 

3teZ such that cr 2 (6>) P8 °= a - S ' a *(Q ) 



70- 



For details, see Francq and Zakoian (2010), page 158 
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(III.) The log likelihood function is integrable at 0q and it has a unique 
minimum at the true value 

E 6a \l* nt {6 )\ < oo and if + O , E 6o l* nt (0) > E eo l* nt (6 ). 

It is easy to show that Eg o I*(0) = Eg l^ t (9) 6lU {00} , because 



E eo [l* nt (6)}- = Eg 



E(T nt ) ■ Eg a 



+ logoff) 



\ogai{6)\ < 



< E( Tnt ) ■ E 0a (logoff)) < E( Tnt ) ■ E 0O Iog-(w) < 00. 
The log likelihood function is integrable at 9q: 

' +log^(^ 

= E{r nt ) ■ E 0O {ri + logoff)) = E(r nt ) ■ (l + E 6o logoff)) < 00 



E 6 l*nt{6o) - E 



Tnt 



The limit criterion is minimized at the true value 9q 



EeX t (9) - E eo l* n Mo > E{r nt ) ■ E 9o 



Pe -a.. 



log 



log 



^0) 



0. 



of (Oq) and as a consequence of (II.), 



where the equality holds iff of (#) 

Pe —a.s. 

this is equivalent to 9 = O . 

(IV.) For any 9 ^ 9 , there exists a neighborhood V{9) such that 
liminf inf 1*0) >' Ee o h(6 ). 

n-s-oo e eV ( ) 

To prove this, we use (I.) and a consequence of the ergodic theorem. 



. . . (i.) 

liminf inf I*(0) > liminf inf I*(0) - limsup sup\I*(6) - I*(0)\ > 

n^oo § e v 1/k (9)ne n ^°° Sev 1/k (e)ne n^-oo eee 

(i-) 



> liminf- V inf l*(0)=Ee o inf MO) 



- t= j eev 1/k (0)ne eev 1/k (e)ne 

In the last equation, we used that inf l* nt {9) is an ergodic process. The expression 

inf h{9) is monotonically increasing in k, so Eg inf h(0) is also 

Sev 1/k (0)ne eev 1/k (e)ne 

monotonically increasing and using Beppo Levi's theorem, 

E 6o inf h(0) fe -±S? EoM9)- 
8ev 1/k (e)ne 

(V.) Last step of the proof, using the compactness of 0. 

For any neighborhood V(0 ) of O , 

limsup inf I*{0) < lim I*{0 ) = Km I*(0 O ) = E eo h(6o)- (16) 
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As 8 is a compact set, by definition, there exist V(9o), V(9i), V(9k) open sub- 
sets of RP+9+ 1 , for which 6 C (u^ =0 V r (6» l )) and V(9i), V(9 k ) satisfy (IV.). 
So 

Mil (6) = min inf 11(6). 

9ee o<i<k 9eenv(9i) 



As a consequence of (IV.) and (16 1, for n large enough 9* n £ V(9q) with prob- 



ability 1. This is true for any neighborhood V(9q), therefore 

91 9 . □ 



Proof of Theorem 



We follow the proof of Francq and Zakoian (20041 and go into details only when 



changes are necessary. See the original proof in their paper or in their book 

(Francq and Zakoian ( 2010[ )) on pages 159-168. 

The Taylor-expansion of the function around 9q is 

ll t {9) = ll t {9 ) + ^* nt {9){9-9 ), 

where 9 is between 9o and 9. 

Derivating, summarizing and multiplying this equation with -4=, we get 



(A5) 1 >p U -„ ( fr _ 

v t=i 

v t=l \ t=l 



8 r . 



8989 



7 l* nt (9) Vn(9* n -9 Q ) 



where 9 is between 9q and 9* 
We will show that 



~m E 7^ f nt(0o) N (0, j(k v - l)J) 



1 " 

n Z — / 



n ' 89id9i 
t=i 1 



°~ l* nt (9)^J(i,j). 



The proof consists of six steps. 

(I.) Integrability of the second-order derivatives of l* lt (9) at 9q 



E 8o 



8 2 1* 
nt (9 ) 



< 00. 



(17) 
(18) 



8888 T 

As Er nt < oo and r nt is independent from lt(9o), it is sufficient to show that 



Ee 



8% 



8888 T 



(Oo) 



< oo, 



which is proven in Francq and Zakoian (20101, on pages 160-162. 
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(II.) J is invertible and Varg (^f(6> )) = Et^ ■ - 1) J 

The invertibility of J is verified in Francq and Zakoianj ( 2010| , on page 163. 
Using (I.), Er nt < oo and the independence between r nt and lt{&), we have 



En 



1)1 



nt , 



Uo \ de 

Then we obtain 
Van 



»o)J =ET nt -E eo {l~if t )-Ee 



1-1=0 



1 dal 



>o) = 



81* 

ul nt 



06 



81* 81* 

Ui nt f Q \ UL nt (g \ 



- Eg 

= Erl t -E Bo {l-r lt f-Ee 



d9 T 

\2 



Erl ■ (k v - 1) ■ J. 



(III.) Uniform integrability of the third-order derivatives of Z* t (0) at 
There exists a neighborhood V(6q) of do such that, for alH, j, k £ {1, ...,p+q+l}, 



Eg SUp 

eev(0 o ) 



< oo. 



dOidOjdOk 

As Er nt < oo and r nt is independent from l t (6o), it is sufficient to show that 

d\(8) 



Eg SUp 

eev(6 ) 



d9 t de j d9 k 



< oo, 



which is proven in Francq and Zakoian (20101, on pages 163-165. 
(IV.) The initial values are asymptotically irrelevant: 



n 

/« ^ — / 



sup 

eev(e ) 



1 f 



► and 



n ^ \ 8989 T {U) 80d9 T 
t=i \ 



(0) 



-> 0. 



(19) 
(20) 



Using the results of Francq and Zakoian (20101 (pages 165-166) we have 

<Kr nt p t (l + r ] f) 



81* 81* 



1 dal 
<J 2 t {9o) ' dOi ' 



So we obtain the estimate 

1 ™ 
77^ E 



t=i 



81* 81* 



< 



1 

v *=i 



1 



<r t 2 (f?o) ^ 
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Markov's inequality, the independence between r„t, r\ t and of (#o) imply that, 
for all e > 0, 



/ n 



% 2 ) 



1 da? 



< 



1 + £e 



1 do* 



>e < 



(*o) 



\ n 

E^« 

' *=i 



where < p < 1 



To show (191, it is sufficient to prove that lim p t ET nt < oo: 

n->oo t=1 



lim p t Er nt ?? lim £V„i = - - < oo. 

n— ^oo * — ' n— >oo ^ — » 1 — jO 

t=l t=l r 

( |20| can be proven similarly. 

(V.) Using the martingale CLT (or Lindeberg's CLT), we prove that 



7^7 £ ^?(*o) N (0: T(S - 1)^)) 

In z — ' w »-»™ 
t=i 



(21) 



Let J" nt =T t = cr{{X t ,X t ~i, ...}) and 

for all A G W+t'+i Vnt = -fe\T^( 0Q ) = ^A T §(0 O ). 

So for every n, (ji nt , J- n t)t<Ei. is a square integrable martingale difference. 

Let us denote with o\ t = EQ (r]f lt \Ft-i), therefore the process 



{&nt)t=l,...,n 



is stationary and ergodic. 

As a consequence, using B6 for Bernstein's theorem 



n 1 n / 

£ CT «* = ~ £ S «o T nt 



t=l 



-1 2 



Se lim tI y 



\ Tdh (f) \ 

-de {do) 



Tt- 



= 7 -{k v - 1) • J. 



We also have for all e > 

n n \ f 

£^o [*&/(l»kt|>e)]=£- / n 



^(e )|>\A^4 



{|r nl AT^(9 )|>^ £ } 



d-Pfln > 0. 



At the second equality we used the stationarity of the process. 
Using the martingale CLT on the process {j] n t 1 J : nt)t& and then the Cramer- 
Wold theorem, (21) is proved. 



18 



(VI.) Using the second order derivative of the Taylor expansion of it can 
be seen that 



At last, if we combine (IV.), (V.),(VI.) and apply Slutsky's lemma on the first 
order derivative of the Taylor expansion of Z* t , ([7]) is proved. □ 
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